Method based on EM algorithm for estimating word translation probabilities in Thai – English machine translation

نویسندگان

CHUTCHADA NUSAI

YOSHIMI SUZUKI

HARUAKI YAMAZAKI

چکیده

Selecting the word translation from a set of target language words, one that conveys the correct sense of source word and makes more fluent target language output, is one of core problems in machine translation. In this paper we compare the 3 methods of estimating word translation probabilities for selecting the word translation in Thai – English Machine Translation. The 3 methods are (1) Method based on frequency of word translation (2) Method based on collocation of word translation, and (3) Method based on Expectation Maximization (EM) algorithm. For evaluation we used Thai – English parallel sentences generated by NECTEC. The method based on EM algorithm is the best method in comparison to the other methods and gives satisfying results. Key-Words: Machine translation, EM algorithm

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating Word Translation Probabilities for Thai – English Machine Translation using EM Algorithm

متن کامل

Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm

Selecting the right word translation among several op tions in the lexicon is a core problem for machine trans lation We present a novel approach to this problem that can be trained using only unrelated monolingual corpora and a lexicon By estimating word translation probabilities using the EM algorithm we extend upon target language modeling We construct a word trans lation model for German an...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

Using Comparable Corpora to Adapt a Translation Model to Domains

Statistical machine translation (SMT) requires a large parallel corpus, which is available only for restricted language pairs and domains. To expand the language pairs and domains to which SMT is applicable, we created a method for estimating translation pseudo-probabilities from bilingual comparable corpora. The essence of our method is to calculate pairwise correlations between the words asso...

متن کامل

Improvement of Statistical Machine Translation using Charater-Based Segmentationwith Monolingual and Bilingual Information

We present a novel segmentation approach for Phrase-Based Statistical Machine Translation (PB-SMT) to languages where word boundaries are not obviously marked by using both monolingual and bilingual information and demonstrate that (1) unsegmented corpus is able to provide the nearly identical result compares to manually segmented corpus in PB-SMT task when a good heuristic character clustering...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Method based on EM algorithm for estimating word translation probabilities in Thai – English machine translation

نویسندگان

چکیده

منابع مشابه

Estimating Word Translation Probabilities for Thai – English Machine Translation using EM Algorithm

Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm

A Hybrid Machine Translation System Based on a Monotone Decoder

Using Comparable Corpora to Adapt a Translation Model to Domains

Improvement of Statistical Machine Translation using Charater-Based Segmentationwith Monolingual and Bilingual Information

عنوان ژورنال:

اشتراک گذاری